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ABSTRACT 


Work-integrated learning, also known as co-operative educa- 
tion, allows students to alternate between on-campus classes 
and off-campus work terms. This provides an enhanced 
learning experience for students and a talent pipeline for 
employers. We observe that co-operative job postings are a 
rich source of information about the required skills, working 
environment and company culture. We present a text min- 
ing methodology to extract and cluster informative terms 
from unstructured job descriptions, and we demonstrate the 
utility of our methodology on a co-op job posting corpus 
from a large North American university. 
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1. INTRODUCTION 


The World Association for Cooperative and Work-integrated 
Education reports that 275 institutions from 37 countries of- 
fer co-operative education (co-op) programs, also referred to 
as work-integrated learning programs’ | Students enrolled in 
co-op programs usually alternate between on-campus classes 
and off-campus work terms at participating employers. Co- 
operative education has become popular for a number of 
reasons: it provides an enhanced learning experience for stu- 
dents, a talent pipeline for employers, and a recruiting tool 
for institutions. 


Concurrent with the popularity of work-integrated learning 
is the desire to understand the co-op job market: students 
want to know what types of jobs are available and what 
skills could make them more employable; employers want to 
know what competition they are facing and how to attract 
top talent; and institutions want to align curricula with job 
market needs. 


http://www.waceinc.org/global_institutions.html 


In this paper, we propose to answer the above questions by 
mining co-operative job postings. We make two contribu- 
tions: 1) a text mining methodology to extract informative 
terms from job descriptions in order to understand a co-op 
job market, and 2) a case study using real data to demon- 
strate our methodology. 


In practice, job descriptions are written directly by em- 
ployers, and therefore they are not standardized or well- 
structured. In particular, job descriptions may include in- 
formation that is unrelated to the nature of the job such as 
website URLs, contact emails, and of course common En- 
glish words. Our technical challenge, therefore, is to extract 
and cluster useful information, such as required skills, work- 
ing environment and company culture. 


We address this challenge by designing a text mining 
methodology to understand a co-op job market through job 
postings. We start by building a parser that extracts rele- 
vant attributes from unstructured job descriptions. We then 
identify frequently occurring attributes in job titles and de- 
scriptions, and we employ Latent Semantic Analysis (LSA) 
and k-means clustering over the extracted attributes to char- 
acterize the types of available jobs. 


To demonstrate the utility of our methodology, we analyze 
nearly 30,000 co-op job postings from a large North Ameri- 
can university. We identify sought-after skills and mindsets, 
we identify the types of jobs available to junior and senior 
undergraduate students, and we discuss trends over time. 
We argue that our findings provide actionable insights for 
students, employers and the institution. 


The remainder of this paper is organized as follows. Sec- 
tion |2| discusses related work; Section |3| describes our data 
and methodology; Section [4] describes the experimental re- 
sults; and Section [5] concludes the paper with the implica- 
tions of our findings and directions for future work. 


2. RELATED WORK 


This paper is related to three bodies of work: text mining, 
co-operative education and workforce studies. We use stan- 
dard parsing and information retrieval techniques, and do 
not make any new algorithmic contributions in text mining. 
Instead, our contribution is to apply these techniques to a 
new application domain in order to obtain new insight. 
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Prior work on co-operative education has focused on its im- 
pact on students’ skills (especially soft skills such as leader- 
ship and SS ae a6 a post-graduate em- 
ployment; see, e.g., [2] [14 . There has also been 
research on what makes co-op Sts successful and what 
Oat) a competencies are expected (see, e.g., [6] [7] [16] [20| 
), understanding competition for co-op jobs (see, e.g., 
7? i, and assessing the overall co-op process and expe- 
rience es e.g., [18)). These works are orthogonal to 
ours, which studies a different problem of understanding a 
co-op job market in terms of the types of available jobs and 
the required skills and attitudes. 


Prior research on job advertisements studied how to write 
them in order to attract qualified applicants (see, e.g., 
[22]), and how to match job descriptions with qualified 
resumes (see, €.g., (9} [29)). Moreover, job descriptions have 
been studied from a gender perspective, e.g., by counting 
the occurrences of masculine and feminine words [25]. While 
these works investigated how job descriptions could attract 
or match applicants, we study a different problem of under- 
standing a co-op market through job descriptions. 


Workforce literature has applied machine learning to im- 
prove recruitment, reduce turnover and understand work 
profiles 5]. Machine learning algorithms have been ap- 
plied to understand the factors affecting work performance 
and retention |5|. Furthermore, Aken et al. cluster Informa- 
tion Technology job postings on job websites to understand 
the work profiles prevalent in the market {1]. Our research 
extends this analysis to understand the work profiles of var- 
ious industries (not only Information Technology) in a co- 
operative education setup and how they have changed over 
time. Not limiting the scope to broad work profiles, our 
research also highlights the specific skills and attitudes re- 
quired by various industries. 


3. DATA AND METHODOLOGY 


We obtained two datasets from a large undergraduate North 
American institution: 12,066 job postings corresponding to 
all co-op jobs that were advertised and filled in 2004, and 
17,057 job postings corresponding to all co-op jobs that were 
advertised and filled in 2014. The job postings are written 
in English. Most of these positions were located in North 
America, with a small number of overseas jobs. We use 
the 2014 data to characterize the current co-op job market 
and we compare with the 2004 data to analyze trends over 
time. Each record in our datasets contains the following 
information: 


e A job title, up to 50 characters long, which generally 
consists of the position and/or the nature of the work. 
Common titles include Web Developer, Engineering 
Intern and Planning Assistant. 


e A job description, with unlimited length and no stan- 
dardized structure or formatting. 


e The year of study of the successful candidate who se- 
cured the job. We refer to jobs obtained by first and 
second year students as junior jobs or lower-year jobs, 
and those obtained by third and fourth year students 
as senior jobs or upper-year jobs. 


Note: EMPLOYMENT BASED IN THE USA* This work opportunity will be based in the USA; therefore all 
applicants must determine whether they are eligible to work in the USA. 

Aqua Book Club (ABC), is a global eReading service <href=www.abe.ca. Ranked Ist in Bloomberg 
Magazine’s annual ranking of startups, we have a strong employee culture that promotes teamwork and open 
communication. 


ABC is looking for Javascrip/ HTMLS/CSS/RoR experts who are obsessed with technology and who love what 
they do. As part of our small team of software engineers, you will be responsible for architecting and 
implementing the UI designs, and working with other members on the team to integrate the the application into 
our platform.Deep understanding of the front end web, from delivery to working with Ajax is required. 
Experience in Ruby on Rails or other MVC web frameworks is a plus. 


Applications are due by 05/30/2014 12 a.m. Applications wont be accepted after that. Attaching a transcript is 
highly recommended. (Include #503482 in the name) - Currently enrolled in BASc or CS at the Intermediate 
level with the Co-op option — Students who have taken cs326 will be prefered 


At ABC, you will get a chance to work closely with the CEO Tim while having the flexibility you need to make 
areal contribution to our system. If you have a past history of excellence, are un-put by challenges, are a team- 
player and have demonstrated ability to learn rapidly on the job, we want to talk to you. Other perks: - Get to 
work on really challenging and diverse problems in a casual environment. - We have a ping-pong and a foosball 
table (We will surely beat you in ping pong)! - A well stocked fridge - free lunch on release days!!! ie we’re 
basicaly a really F*U*N place to work. The office is located downtown and is easily reached by TTC. 


Join us for the Evening Happy Hour on Friday, May 23rd 2014, 7:30 pm. Check out the Facebook event page 
here: https://www.facebook.com/events/573997/. 

HERE HH HR eel free to contact Ruby 
Smith (rsmith@abc.com) or Jason Pinn (jason@abe.com) for any questions you have about working at ABC. 


*** Apply asap!*** 


Figure 1: An anonymized job description 
e The academic program of the successful candidate. 


Since the job postings in our dataset do not include indus- 
try or discipline labels, we use the academic program of the 
student who obtained the job as a proxy. The institution 
provided us with a mapping from students’ academic pro- 
grams to job disciplines; e.g., positions filled by Computer 
Science or Software Engineering students are classified as 
Information Technology jobs. In our case study, we focus on 
the largest discipline in the institution’s co-op market: Infor- 
mation Technologies (IT). We also point out interesting find- 
ings from other major disciplines: Finance, Health Studies, 
Arts, Biology, Environmental Studies, Chemical Engineer- 
ing, Civil Engineering, Electrical Engineering and Mechani- 
cal Engineering. 


F igure[I]shows an anonymized example of a job description 
from our dataset. It includes the following information: 


e Technical skills: Javascript, Ruby on Rails 
e Soft skills: team player, ability to learn 
e Job duties: architecting and implementing UI designs 


e Desired mindset and attitude: obsessed with technol- 
ogy 


e Perks: ping-pong and foosball table, free lunch 


e Company culture: casual environment 


However, there is also some content that does not describe 
the job itself: names of people and locations, URLs, email 
addresses, HTML tags, timestamps, special formatting, and, 
of course, common English words. The first part of our 
methodology, therefore, is a parser that extracts job-related 
attributes from unstructured job descriptions. The parser, 
implemented in Python, consists of the following steps. 
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1. Using regular expression matching, we remove URLs, 
HTML tags, phone numbers and other numbers, 
email addresses, timestamps, administrative annota- 
tions added by the institution (such as the text follow- 
ing “Note:” in Figure[Ip, formatting characters such as 
bullet points, and sequences of special characters serv- 
ing as separators (such as the sequences of dashes and 
hashtags in Figure [ip. 


2. We tokenize the remaining text and remove special 
characters embedded in words (such as F*U*N in Fig- 
ure 1p. To remove unimportant terms, we build a vo- 
cabulary, called Remove-List, consisting of common 
English word4?] misspellings] and abbreviationg"| as 
well as manually-curated lists of company names, lo- 
cations, addresses and persons’ names appearing in the 
institution’s co-op system. 


3. We have to be careful to not remove informative terms. 
For example, “Ajax” is a city in Canada and is there- 
fore in Remove-List. However, Ajax is also a Web de- 
velopment toolkit. To address this problem, we cre- 
ate another vocabulary called Keep-List, of words that 
should not be removed. This vocabulary consists of 
skills found on a resume help Web sit]: and job du- 
ties from the Canadian National Occupation Classifi- 
cation] Note that Keep-List only contains a subset 
of words we are interested in; e.g., it is missing many 
specific technical skills, perks and company culture de- 
scriptors. 


4. We stem the remaining tokens using the NLTK snow- 
ball stemmer|"]and we remove stop words. Finally, we 
leverage our domain knowledge by converting impor- 
tant terms that can be written in different ways into a 
standard form; e.g., “java-script” and “javascript” both 
map to “javascript”. 


At the end of the parsing process, each job description is 
reduced to its stemmed words, minus those in Remove-List 
but not in Keep-List. In the remainder of the paper, we will 
refer to these stemmed words as “words”, “terms”, “tokens” 
and “attributes” interchangeably. 


The second part of our methodology is designed to analyze 
the extracted job attributes. We do this in two ways: 


1. To identify popular skills, attitudes, working environ- 
ment and perks, we report attributes that occur at 
least once in a large percentage of job descriptions. 
Notably, and in contrast to other text mining applica- 
tions, we do not count the number of occurrences of an 


‘http://www.lextutor.ca/freq/lists_download/ 


common_misspe ings/ror_machines 


“https: //media.gcflearnfree.org/ctassets/modules/ 


https://www.thebalance.com/ 
i 


http://noc.esdc.gc.ca/English/noc/welcome.aspx? 


ver=16 
www.nltk.org/_modules/nltk/stem/snowball .html 


Table 1: Top 10 frequent tokens in IT job titles 


Token Freq. in 2014 Token Freq. in 2004 
softwar 45% develop 37% 
develop 44% softwar 27% 
analyst 8% analyst 17% 

applic 7% programm 11% 

web 5% assist 9% 
support 4% web 8% 
assist 4% support 7% 
programm 4% applic 6% 
system 3% system 6% 
quality 3% specialist 4% 


attribute within a posting—we observed that important 
job requirements such as knowledge of the “Java” pro- 
gramming language are usually mentioned only once. 
We also identify attributes mentioned by more junior 
than senior jobs (and vice versa), and we compare at- 
tributes mentioned by more jobs in 2014 than 2004 
(and vice versa) to characterized trends over time. 


2. We use clustering to identify the different types of 
available co-op jobs within a discipline. Following pre- 
vious work on text clustering [24], we start by 
applying Latent Semantic Analysis (LSA) to the job 
descriptions, with each job description represented as 
a job vector. The ith coordinate of a job vector is 
equal to the inverse document frequency (IDF) of the 
ith word in the set of possible words, provided that 
this word is mentioned in the given job description at 
least once (and zero otherwise). Following previous 
work, we use LSA to reduce the dimensionality of job 
vectors from the number of distinct words down to one 
hundred [28]. Each reduced dimension corresponds to 
a latent concept in the data. We then run k-means 
clustering on the transformed job vectors, and we re- 
port a few top terms (again, ranked by IDF) from each 
cluster centroid as representatives. 


4. RESULTS 

In this section, we demonstrate the utility of our methodol- 
ogy. We show in-depth results for the largest discipline in 
our dataset: Information Technologies (IT), including fre- 
quent term analysis (Section [4-1), analysis of significant dif- 
ferences in term frequencies between 2014 and 2004 and be- 
tween senior and junior jobs (Section (4-2), and clustering 
analysis (Section |4.3). We summarize our results for other 
disciplines in Section [4.4] 


4.1 Frequent Term Analysis 

Table [1] shows the top 10 attributes occurring in the most 
IT job titles in 2014 and 2004; for example, the first row 
indicates that the token “softwar” appears at least once in 
45% of job titles in 2014 and 37% in 2004. Not surprisingly, 
nearly half the titles mention software development. 


Table|2|shows the top 25 attributes occurring in the most IT 
job descriptions in 2014 and 2004. Overall, most IT co-op 
jobs appear to be software developer jobs. In 2014, hard- 
ware was mentioned in only 14% of the postings and embed- 
ded systems in 7%; in 2004, these percentages were slightly 
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Table 2: Top 25 frequent attributes in IT job descriptions 


Token Freq. in 2014 Token Freq. in 2004 
develop 91% develop 80% 
team 84% applic 65% 
softwar 76% softwar 62% 
applic 66% system 61% 
design 65% team 61% 
product 62% program 54% 
program 60% design 53% 
system 58% communic 50% 
project 53% comput 49% 
comput 52% product AT% 
test 50% support 43% 
build 48% test 43% 
communic 48% servic 42% 
web 47% project 41% 
code 46% lead 39% 
help 46% excel 39% 
learn 45% solut 38% 
servic 44% web 38% 
java 43% tool 37% 
manag 43% assist 36% 
creat. 43% busi 36% 
solut 42% manag 35% 
technic 42% java 35% 
tool 41% custom 34% 
excel 40% oper 33% 


higher, at 22 and 9, respectively (and the actual number of 
hardware and embedded systems jobs was slightly higher in 
2004). Furthermore, about half the job descriptions men- 
tion testing. Notably, mentions of some soft skills such as 
communication are more frequent than mentions of specific 
technical skills such as Java in both years. 


By inspecting other frequent attributes, we obtain the fol- 
lowing insights about frequently mentioned programming 
languages, platforms and applications in 2014: 


e Programming languages: Java (43%), C++ (33%), 
JavaScript (31%), C (24%), Python (22%), C# (20%), 
HTML (19%), CSS (17%), PHP (12%), NET (12%), 
jQuery (10%), Perl (10%), XML (9%), Ruby (9%) 


e Development: web (47%), mobile (32%), game (12%) 


e Databases: database (29%), SQL (26%), mySQL (8%), 
Oracle (7%) 


e Mobile applications: android (19%), iPhone (7%) 


e Operating Systems: linux (21%), unix (13%), iOS 
(14%) 


e User-centered development: user (35%), agile (18%), 
deploy (16%) 


e Other applications: server (29%), distributed (17%), 
security (17%), cloud (9%), graphic development (8%), 
big data (4%) 


e Concepts: OOP (Object-Orient Programming) (24%), 
algorithms (18%), scalable (14%) 


In terms of the working environment and company culture, 
the strongest result is that the word “team” is very fre- 
quent, suggesting a collaborative environment. Other fre- 
quent terms include challenging (32%), dynamic (20%), fun 
(16%), flexible (15%) and diverse (12%). Amenities such as 
free food, foosball and ping-pong tables are also frequent. 
The word start-up is mentioned in 11% of the job postings. 


We also note the occurrence of mindset-related terms such as 
learn (45%), innovation (32%), passion (25%), focus (23%), 
creativity (22%), motivation (20%), love (15%) and enjoy 
(10%). 


Similarly, for 2004, we identify the following frequently men- 
tioned programming languages, platforms and applications: 


e Programming languages: Java (35%), C++ (31%), C 
(21%), HTML (22%), XML (15%), ASP.NET (12%), 
Perl (11%), .NET (10%), JavaScript (10%), JSP (8%), 
C# (7%) 


e Development: web (38%), mobile (10%), game (5%) 


e Databases: database (30%), SQL (27%), Oracle 
(13%), mySQL (2%) 


e Operating Systems: unix (22%), linux (15%) 


e User-centered development: user (21%), deploy (7%), 
agile (0.5%) 


e Other applications: server (25%), security (15%), 
graphic development (10%) 


e Concepts: OOP (13%), algorithms (7%), scalable (4%) 


Compared to 2014, the word “team” was again frequent in 
2004, but words related to mindset, company culture and 
perks were less frequent. 


Our results indicate that IT positions focus on soft- 
ware rather than hardware, especially web and Java 
development. The work environment appears team- 
oriented. In 2014, descriptions of mindset and com- 
pany culture are appearing frequently. 


4.2 Significant Differences 

Next, we investigate the differences between 2014 and 2004 
IT job descriptions which we began to see in the previous 
section. Table |3| summarizes the results by listing 20 at- 
tributes with most significant differences in frequencies be- 
tween 2004 and 2014 (on the left), and 2014 and 2004 (on 
the right). We define a difference in frequencies, abbreviated 
A, as the percentage of job postings mentioning an attribute 
in one year minus the percentage of job postings mention- 
ing this attribute in the other year. Both lists are sorted 
by A, and all results shown are statistically significant with 
P-values less than 0.05 using a proportion test [13]. We omit 
the analysis of job title differences between 2004 and 2014 
which gave similar results. We also show a Venn diagram 
in Figure[2] which illustrates the overlap among the top 100 
frequent attributes in 2004 and 2014 IT jobs. 
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Table 3: Differences in frequency between job description 
attributes of 2004 and 2014 IT 


Table 4: Differences in frequency between job description 
attributes of junior and senior jobs in 2014 IT 


Token 2004 2014 A Token 2014 2004 A Token Jr. Sr A Token Sr. Jr. A 

assist 36% 22% 14% build 48% 15% 33% document 29% 16% 13% c++ 46% 21% 24% 
asp 12% 2% 10% help 46% 19% 26% support 42% 31% 11% algorithm 28% 9% 20% 
internet 18% 9% 9% team 84% 61% 24% assist 27% 16% 11% scale 28% 9% 19% 
unix 22% 13% 8% code 46% 24% 22% communic 53% 43% 10%  scienc 49% 31% 17% 
hardwar 22% 14% 8% mobil 32% 10% 22% manag 48% 38% 10% featur 39% 22% 17% 
sort 8% 0% 8% javascript 31% 10% 21% test. 54% 45% 9% python 31% 14% 16% 
clarifi 8% 1% 8% passion 25% 5% 20% report 26% 17% 9%  scalabl 23% 7% 16% 
interperson 18% 10% 7% featur 30% 10% 20% busi 42% 34% 9% build 57% 41% 15% 
oper 33% 26% 7% creat 43% 23% 20% written 21% 138% 8% code 54% 40% 15% 
msaccess 8% 1% 77% python 22% 3% 19% activ 23% 15% 8% complex 27% 13% 13% 
manufactur 10% 4% 6% learn 45% 26% 19% educ 17% 10% 7% comput 59% 46% 13% 
cost 11% 5% 6% collabor 23% 5% 18% standard 15% 8% T% c 31% 18% 13% 
xml 15% 9% 6%  agil 18% 0% 18% interperson 13% 6% 7% product 69% 57% 13% 
support 43% 37% 6% product 62% 47% 16% instal 9% 3% 7%  structur 21% 9% 12% 
expens 8% 2% 6% contribut 27% 12% 15% troubleshoot 15% 9% 6% field 23% 11% 12% 
intranet 7% 1% 6% problem 34% 19% 15% msoffic 8% 2% 6% java 50% 38% 12% 
oracl 138% 7% 5% improv 25% 10% 15% summari 24% 18% 6% data 42% 30% 12% 
prepar 11% 6% 5% — solv 20% 6% 15% execut 15% 9% 6%  distribut 23% 12% 11% 
supervis 12% 7% 5% app 15% 1% 14% detail 11% 5% 6% search 16% 6% 10% 
xp 8% 3% 5% peopl 33% 18% 14% account 12% 6% 6% problem 40% 29% 10% 
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Figure 2: Overlap between the top 100 most frequent 
attributes of IT jobs in 2004 and 2014 


Our results suggest that 2004 job postings include more 
entry-level positions (suggested by attributes such as “as- 
sist”, “support”, “prepare”, “arrange” and “document”), and 
mention technologies and software popular at the time such 
as ASP, XML, Windows XP and Microsoft Access. Addi- 
tionally, the fraction of hardware-oriented jobs was higher in 
2004. On the other hand, job postings in 2014 include words 
representing current technologies such as mobile, Javascript, 
Python, agile and app (and, further in the list, scalable and 
distributed systems). Notably, many soft skills and mindset- 
related terms are more frequent in 2014: “passion”, “cre- 
ate”, “learn”, “collaborate” and “contribute”. Although not 
shown in Table other terms that are more frequent in 
2014 include company culture descriptors such as “innova- 
tive”,“challenging”, “fun” and “diverse”. 


The next important difference is that between junior and 
senior jobs. Table [4] shows two lists: top terms appearing in 
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Figure 3: Overlap between the top 100 most frequent 
attributes of Junior and Senior IT jobs in 2014 


more junior than senior jobs (on the left), and top terms ap- 
pearing in more senior than junior jobs (on the right), both 
in 2014 and both sorted by the difference of percentages. Ta- 
ble[5|shows the same two lists, but for 2004. Figures[B]and [4] 
show Venn diagrams that illustrate the overlap among the 
top 100 frequent terms from junior and senior jobs in 2014 
and 2004, respectively. 


We observe that in 2014, junior jobs are more likely to be 
entry-level documentation, testing or troubleshooting jobs. 
Junior job postings are more likely to mention soft skills such 
as communication and interpersonal skills. In terms of spe- 
cific technologies, junior jobs mention HTML, SQL and Web 
5 percent more frequently than senior jobs. On the other 
hand, senior jobs in 2014 mention technical concepts and 
specific programming languages such as algorithms, scala- 
bility, data, C++, C and Python. Other interesting differ- 
ences not shown in the table are OOP (9% more frequent 
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Table 5: Differences in frequency between job description 
attributes of junior and senior jobs in 2004 IT 


Token Jr. Sr A Token Sr. Jr. A 
maintain 23% 14% 9% c++ 45% 21% 24% 
support 47% 38% 9% Cc 32% 14% 18% 
updat 13% 5% 8% design 63% 46% 17% 
html 26% 18% 8% cost 20% 5% 14% 
excel 43% 35% 8%  clarifi 16% 2% 14% 
msoffic 138% 5% 8% expens 16% 2% 14% 
troubleshoot 14% 7% 8% arrang 17% 4% 13% 
document 30% 23% 7% — sort 16% 3% 13% 
user 24% 17% 7% — solut 44% 33% 11% 
qualiti 26% 20% 6% challeng 29% 19% 10% 
report 26% 20% 6% develop 85% 76% 9% 
web 40% 34% 6% linux 20% 11% 9% 
mainten 14% 8% 6% complex 16% 7% 9% 
instal 13% 7% 6% algorithm12% 3% 9% 
interperson 20% 15% 5% unix 27% 18% 9% 
server 27% 22% 5% code 29% 21% 8% 
hardwar 24% 19% 5% lead 44% 36% 8% 
xp 10% 5% 5% innov 27% 19% 8% 
time 31% 26% 5% oop 18% 10% 8% 
offic 21% 17% 5% — scale 11% 3% 8% 


2004 Lower Yr IT 


2004 Upper Yr IT 


Figure 4: Overlap between the top 100 most frequent 
attributes of Junior and Senior IT jobs in 2004 


than in junior jobs), linux (8%), cloud (8%), security (7%) 
and data science (5%). 


We observe similar patterns in Table |5| and Figure In 
2004, junior jobs also included terms suggesting entry-level 
positions, whereas senior jobs included more mentions of 
programming languages and computing concepts. 


To summarize, there are clear differences between 
2014 and 2004 IT jobs, and between junior and se- 
nior jobs. In addition to differences due to new tech- 
nologies, soft skills, mindset and company culture are 
more frequently mentioned in 2014. In both years, 
junior IT jobs are more likely to mention documenta- 
tion, testing and troubleshooting, whereas senior jobs 
are more likely to mention technical concepts. 


Table 6: Largest clusters of 2014 IT jobs 


Label Tokens in cluster centroid %AIl %Jr. %Sr. 


oe javascript, html, web, css, 
Te lacmene sql, c#, server, java, net, 22% 64% 36% 
P jquery 


c++, c, languag, linux, 
python, oop, scienc, 21% 46% 54% 
algorithm, perl, script 
startup, python, javascript, 


Programming 


peas featur, code, web, love, 18% 39% 61% 
stack, fun, passion 

siees sql, analyst, test, solut, 

Analyst c#, script, execut, financi, 16% 69% 31% 
document, busi 

Mobile 10, android, mobil, app, 

Development platform, Java, agil, iphon, 10% 61% 39% 
devic, c 

hardwar, troubleshoot, 

System Ad-  configur, instal, network, 
iinistiaton mela server, user, xp, 6% 87% 13% 


resolut 


4.3 Clustering Analysis 

After investigating frequently occurring terms, we now clus- 
ter the IT job descriptions to understand the types of avail- 
able jobs. We experimented with different numbers of clus- 
ters between 2 and 30. We present results using ten clusters; 
using fewer clusters led to different types of jobs being as- 
signed to the same cluster, whereas using more clusters led 
to similar types of jobs belonging to multiple clusters. 


Table[6|]shows the six largest clusters in 2014 sorted by size; 
the remaining four clusters had under 2% of the total num- 
ber of jobs each. We report the representative tokens of each 
cluster centroid, a manually-assigned label summarizing the 
tokens, and three percentages: the percentage of all jobs 
assigned to this cluster, and the percentages of junior and 
senior jobs within this cluster. We highlight the higher of 
the last two percentages in bold font to indicate whether a 
cluster consists of more junior or senior jobs. 


Based on the clustering results, we characterize the IT co- 
op market as follows. The five largest clusters cover 87% 
of IT jobs, spanning web development (22%), programming 
(21%), start-ups (18%), business analysis (16%) and mo- 
bile development (10%). The junior vs. senior split evident 
in the clustering is consistent with our earlier results from 
Section [4-2] troubleshooting jobs are mostly filled by junior 
students, whereas jobs mentioning company culture, many 
of which are startups, are filled by senior students. 


Table [7] shows the 7 largest IT clusters in 2004; the remain- 
ing three clusters are small and one of them contains job 
postings from a specific large employer at the time. There 
is no longer a start-up cluster with mentions of the working 
environment, and there is an emphasis on hardware in the 
last cluster. These results align with our earlier results from 
Section 


To summarize, our clustering methodology segments 
the IT job market into web development jobs, general 
software development jobs, data analysis jobs, mobile 
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Table 7: Largest clusters of 2004 IT jobs 


Label Tokens in cluster centroid %AIl %Jr. %Sr. 


Software java, test, sql, tool, server, 
Development qualiti, softwar, autom, 20% 61% 39% 
and Testing custom, solut 
Web html, sql, web, asp, 
javascript, server, java, 19% 66% 34% 


Development xml, databas, net 
scienc, databas, model, 
Databases comput, analysi, group, 15% 57% 43% 


research, data, tool, 
msaccess 
c++, sort, clarifi, expens, 


Tee arrang, cost, gui, code, 15% 40% 60% 
java, softwar 
network, hardwar, 
System Ad- troubleshoot, instal, user, 
Tidienreton configur, xp, desktop, 9% 87% 13% 
msoffic, problem 
perl, script, languag, unix, 
Programming c, java, rank, c++, 7% BT% 43% 
enterpris, linux 
Embedded video, digit, hardwar, c, 


Systems and multimedia, debug, embed, 7% 36% 64% 
Graphics devic, c++, graphic 


development jobs and troubleshooting jobs. Mentions 
of mindset and work environments in 2014 are fre- 
quent enough to create a separate cluster for these 
jobs. 


4.4 Analysis of Other Disciplines 

In this section, we apply our text mining methodology to 
the other disciplines in our dataset. As before, we structure 
the results into frequent term analysis, difference analysis 
(2014 vs. 2004 and junior vs. senior jobs), and clustering 
analysis to characterize the types of available jobs within 
each discipline. We focus on job description analysis and 
only mention the results of job title analysis if they lead to 
additional insight. 


4.4.1 Frequent Term Analysis 

Overall, all the other disciplines have frequent mentions of 
soft skills (“team”, “communication”, “leadership”) and basic 
computing skills (databases and Microsoft Office) in both 
2004 and 2014. Below, we highlight additional frequent 
terms for each discipline. 


Finance: soft skills indicating client relationships (“client”, 
“interpersonal”, “relationship”); finance-specific technical 
skills (‘audit”, “tax”, risk assessment, asset valuation, mar- 
ket analysis); formal office working environment (“bank”, 
“office”) 


Health Studies: soft skills (“active students”, indi- 
cating physical fitness); health-specific terms (“patient”, 
“care”, “kinesiology”, “therapy”, “injury”, “rehabilitation”, 


“ergonomics”, “physiotherapy”, “recreation”) 


Arts: tokens related to editorial, technical and content writ- 
ing (“edit”, “write”, “english”, “proofread”, “content”); addi- 
tionally, media and social media were frequently mentioned 


in 2014. 


Biology:  discipline-specific technical terms (“molecu- 
lar”, “chemistry”, “microbiology”, “biochemistry”, “disease”, 
“cell”, “tissue”, “DNA”, “genetics”, “pharmaceutical”); lab- 


oriented work environment (“research”, “lab”, “technician”) 


Environmental Studies: discipline-specific terms (GIS 
(Geographic Information System), “water”, “land”, “soil”, 
“map”, “survey”, “sample”, “policy”); field work environment 
(“field”, “site”). Frequent words in job titles: “assistant”, 
“planner”, “technician”, “research”, “analyst”, “inspector”, 


“project”, “management”. 


Chemical Engineering:  Discipline-specific technical 
terms (“chemistry”, “process”, “manufacturing”, “equip- 
ment”, “sample”, “procedure”, process improvement, 
“safety”); lab-oriented work environment (chemical plants, 
research labs). Additionally, frequent in 2014: project man- 
agement; frequent in 2004: field-work. 


Civil Engineering: construction-related tokens (“design”, 
“AutoCAD”, “site”, “field”, “concrete”, “safety”); graphic de- 
sign (“graphic”, “PhotoShop”). 


Electrical Engineering: discipline-specific technical skills 
(“electrical”, “hardware”, “power”, “schematic”, “control”, 
“embedded”, “circuit”); computing skills (“code”, Web, Java, 


SQL). Frequent terms in job titles: “design”, “quality”, “as- 


29 66. 


surance”, “testing”, “research”. 


Mechanical Engineering: discipline-specific terms 


(“equipment”, “assembly”, “robot”, “circuit”, “material”, 
“CAD”, “SolidWorks”, “AutoCAD”, “control”, “process”, “im- 
provement”, “maintenance”, “draw”, “prototype”, “test”, 


“troubleshoot”, “safety”); work environment (“plant”, “shop”, 


2966. 


“floor”, “manufacturing”). 


4.4.2 Significant Differences 

Next, we highlight differences in frequent terms between 
2004 and 2014. Overall, we observed that each discipline had 
more mentions of soft skills, and more mentions of project 
management and IT-related terms in 2014. Additional dif- 
ferences are summarized below for each discipline. 


Finance: 2004 jobs mention actuarial science more; 2014 
jobs mention risk management and assessment, “equity”, 
“trade”, “client” and “interaction” more. Additionally, 2014 
jobs mention concepts related to data analysis (e.g., Mi- 
crosoft Excel and VBA). 


Health Studies: 2014 jobs include more research related 
terms: “research”, “summary”, “data”, “review”, “cancer”. 
2004 jobs have more mentions of “recreation”, “kinesiology”, 
“outdoor”, “therapy” and “teach”. In particular, “cancer” ap- 
pears in 6% more job postings in 2014 than in 2004. 


Arts: more 2014 jobs mention market analysis and media- 

related terms: “media”, “project”, “management”, “Power- 

Point”, “client” and “relationship”. 2004 jobs mention more 
19 66. 99 66. 


writing-related terms such as “history”, “newsletter”, “proof- 
read”, “French” and HTML. 


Biology: 2014 job postings include more research and 
project management positions, and mention computing 
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skills and clinic more often. 2004 job postings mention lab- 
oratory terms including “technique”, “microbiology”, “sam- 
ple”, “gel”, “biochemistry”, “microbe”, HPLC (High Perfor- 


mance Liquid Chromatography blood test), “bacteria”. 


Environmental Studies: 2014 jobs mention project man- 
agement, clients, research and computing skills more often. 
2004 jobs mention “educate”, “air”, “waste”, “treatment”, “re- 
cycle” and ground water. It is interesting to note that “sus- 
tain” (sustainability) is mentioned 7% more often in 2014 


than in 2004. 


Chemical Engineering: 2014 jobs mention project man- 
agement terms (e.g., “manage”, “report”, “project”, “main- 
tain”), “safety”, “energy”, “oil”, “gas”, “petroleum” and “sand” 
more often than 2004 jobs. On the other hand, 2004 jobs 
mention more computing skills and laboratory-specific terms 


(“lab”, “technician”, “sample”, “treatment”). 


Civil Engineering: 2014 jobs mention more software 
(“software” and “AutoCAD” are mentioned 21% and 8% 
more often, respectively, in 2014 than in 2004). 2004 jobs 
mention “cost” and “expense” more often than 2014 jobs. It 
is interesting to note that “safety” is mentioned 13% more 
often in 2014 than in 2004. 


Electrical Engineering: 2014 jobs mention “passion” and 
computing skills related to web development, core program- 
ming languages and mobile development. 2004 jobs mention 
more “manufacturing”, “graphic”, “multimedia”, “processor”, 
“hardware”, “VHDL” (a hardware description language) and 


“Unix”. 


Mechanical Engineering: 2014 jobs mention research 
(suggested by “lab”, “research”, “simulate”, “electron”), 
client-oriented development (“client”, “customize”) and com- 
puting terms (Python, Java, “mobile”). 2004 jobs are more 
likely to mention mechanical engineering terms: “blueprint”, 
“draw”, “cost”, “weld”, “hydraulics”, “gear”. It is interesting 
to note that “quality” is mentioned 9% more in 2014 than 
in 2004. While both AutoCAD and SolidWorks are CAD 
software, SolidWorks is mentioned 11% more in 2014 while 
AutoCAD is mentioned 5% more in 2004. 


Next, we compare the differences between tokens in junior 
and senior jobs in each discipline. Overall, more senior 
jobs across all disciplines mention project management or 
deal with advanced concepts of the field (either through ap- 
plications or research). Junior jobs appear to have more 
clerical work, computing-related responsibilities or mention 
less advanced concepts of the discipline (including testing, 
field work and lab work). We provide additional discipline- 
specific details below. 


Finance: Senior jobs require more technical knowledge of 
the field (“audit”, “invest”, “risk” “management”). Junior 
jobs have a more clerical (“document”, “arrange”, “English”) 
and computing (HTML, Java, databases) focus. Senior 
jobs are more likely to mention “commitment”, “dynamism”, 
“client” and “interaction”. Additionally, senior jobs in 2004 
mention more mathematical and statistical terms than ju- 
nior jobs in 2004. 


Health Studies: Senior jobs mention more research. Ju- 
nior jobs mention more field work. 


Arts: Senior jobs mention more project management (sug- 
gested by “manage”, “PowerPoint”, “client”, “workload”, 
“process”, “improvement”). Junior jobs mention more cler- 
ical work, “English”, “Web”, “research” and “customer ser- 
vice”. Additionally, senior jobs in 2004 appear to include 
more business analyst and editor roles than junior jobs in 


2004. 


Biology: Senior jobs mention more “research”, “hospital” 
and technical terms including “genetics”, “therapy”, “can- 
cer”, “cardiovascular”, “nanomedicine”, “biomaterial” and “in 
vitro”. Junior jobs are more likely to mention “office”, “as- 
sistant”, “support” and “campaign”. 


Environmental Studies: Senior job titles indicate more 
planner and analyst positions with more project manage- 
ment, policy-making and GIS terms mentioned in the de- 
scriptions. Junior job titles indicate more lab technician, 
inspector, and surveyor positions with more “lab”, “survey”, 
“test” and “outdoor” mentioned in the descriptions. 2004 
senior jobs additionally mention environmental concepts in- 
29 66. 1906 66. 


cluding “ground”, “water”, “remedy”, “contaminate”, “river”, 
“hydrology” and “hydrogeology”. 


Chemical Engineering: Senior jobs mention a more in- 
dustrial working environment with more mentions of “en- 
ergy”, “product”, “design”, “cost”, “process”, “improvement” 
and “optimization”. Junior jobs mention laboratory-specific 
terms (“research”, “sample”, “record”) more often. In 2004, 


senior jobs mentioned more chemical manufacturing terms. 


Civil Engineering: Senior jobs mention more “modelling”, 
“design”, “client”, “interaction” and “software”. Junior jobs 
mention more “inspection”, “field”, “survey”, data recording 
and clerical work. 2014 senior jobs have more mentions of 


project management. 


Electrical Engineering: Senior jobs mention more electri- 
cal concepts (“power”, “circuit”, “embedded”, “distributed”, 
PCB (Printed Circuit Board), “sensor”, “chip”, “schematic”). 
Junior jobs mention more quality assurance and basic com- 
puting terms (“web”, “program”) as well as more clerical 
work. In addition, junior jobs in 2004 contain system ad- 
ministrator positions and senior jobs in 2004 mention more 
programming languages (C++ and C). 


Mechanical Engineering: Senior jobs are more likely 
to mention project management, designing and imple- 
mentation. Junior jobs have more clerical (e.g., “up- 
date”, “arrange”, “email”, “written”), computing (marked by 
“database”, “Web”, SQL, HTML, Java) and field-work, and 
requirement collection terms (“client”, “custom”, “meet”). 
Junior jobs in 2004 do not mention client interaction; in- 


stead they mention testing. 


4.4.3 Clustering Analysis 
Finally, we apply our clustering methodology to each disci- 
pline, both for 2014 and 2004. Our clustering results provide 


additional support for the findings in Section and 
Additionally, the main benefit of clustering is that it reveals 
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the different types of available jobs in each discipline. We 
discuss these findings below. 


Finance: In 2014, the largest clusters were: several clus- 
ters mentioning finance-specific skills such as “trade”, “eq- 
uity”, “tax”, “reconciliation”, “pension”, asset valuation, risk 
management, “forecast”, “causality” and “insurance” (63%); 
financial documentation (15%); and Web software develop- 
ment (10%). The jobs clustered under finance-specific skills 
were dominated by senior students, with the clerical (docu- 
mentation) and IT (web development) clusters dominated by 
junior students. This result aligns with our analysis of sig- 
nificant differences from the previous section. Furthermore, 
in 2004, the largest clusters relate to financial analysis and 
documentation (51%), actuarial work including “valuation” 
and “pension” (18%), “tax” and “audit” (14%), and “causal- 
ity” and “insurance” (5%). Thus, the 2004 clusters focus 
more on documentation and appear to describe a narrower 
range of jobs. All clusters except the last one mentioned 
have an equal split of junior and senior jobs. 


Health Studies: The largest clusters in 2014 are related 
to organizing community events (21%), recreation camps for 
adults and children (14%) and therapy (13%), and are dom- 
inated by junior jobs. Smaller clusters dominated by se- 
nior jobs are related to research, cancer patient care and 
advanced aspects of health studies, including biomechanics, 
anatomy and statistics. The 10 clusters in 2004 are similar 
but exhibit equal proportions of junior and senior jobs in 
recreation, leisure and patient care. 


Arts: The largest job clusters in 2014 include writing online 
content (24%), organizing events and providing customer 
service (22%), and writing, proofreading and summarizing 
research material (13%). These clusters have an almost 
even split of junior and senior jobs. Other clusters include 
project management (indicated by “stakeholder”, “Power- 
Point”, “present”), market analysis (“campaign”, “blog”, 
“promote”), content writing (“Drupal”, WCMS, standing for 
Web Content Management System), library liaisons and 
teaching (adult education, names of courses), which are 
dominated by senior students. Additionally, 52% of the jobs 
in 2004 fall in one cluster characterized by preparing En- 
glish material for education and research on various topics 
including policy and politics. Other clusters include publish- 
ing newsletters and articles (with “graphics”) (12%), office 
assistant positions (indicated by words such as “multitask”, 
“file”, “compile”, “photocopy” and “fax”) (8%), teaching and 
business analysis. Most of the clusters have an almost even 
split between junior and senior jobs. It appears that the 
Internet and social media have created new Arts jobs. 


Biology: Our clustering results identify jobs in various 
fields of this discipline (microbiology, molecular biology, ge- 
netics, biochemistry), using various techniques (chromatog- 
raphy, electrophoresis). 


Environmental Studies: The largest clusters in 2014 
include project management (31%), education/research 
(25%), survey (18%), urban planning (13%) and advanced 
topics including GIS, cartography and geospatial analysis. 
(13%). On the other hand, half the jobs in 2004 mention 
educating people (largest cluster). While 8% of the jobs are 


related to advanced concepts, the other three clusters in- 
volve urban planning (20%), hydrogeology (14%) and waste 
water treatment (12%). 


Chemical Engineering: Clustering 2014 Chemical jobs 
reveals additional insight: there is a cluster of jobs related 
to mechanical aspects of chemical plants, including the term 
“equipment”. Additionally, a cluster with “nanotechnology”, 
“lab”, “material” and “physics” includes 10% of 2014 jobs. 
While 8% of the jobs are related to energy sources (includ- 


ing “oil”, “gas”, “petroleum”, “sand” and “biofuel”), 5% of the 


jobs revolve around “emission”, “environment”, “pollution”, 
“regulation” and greenhouse gases. Similar to 2014, 2004 
clustering also contains clusters related to the mechanics of 
chemical plants, process improvement and research. It is in- 
teresting to note the differences in the field of application in 
both the years. While 2014 concentrates on nanotechnology, 
energy and emissions, 2004 deals with pharmaceuticals and 
waste water treatment. 


Civil Engineering: Consistent with the previous section, 
junior students dominate the clusters including on-site field 
work (data collection and inspection), and senior students 
dominate the design clusters. 


Electrical Engineering: The types of jobs in 2014 include 
System development (18%), web development (14%), elec- 
trical drawing (12%), PCB and circuit design (12%), sys- 
tem administration (9%), quality assurance (9%), simula- 
tion/research (8%), power (8%), embedded systems (8%) 
and research on advanced topics including transmitters, ef- 
fect on climate, power grids, etc. (2%). In line with the 
findings of the previous section, there is a higher propor- 
tion of junior jobs in computing and system administra- 
tion, and a higher proportion of senior jobs in core elec- 
trical clusters including circuit design and embedded sys- 
tems. The main types of jobs in 2004 are related to power 
systems (26%), IT (19%), project management (18%), cir- 
cuit design (15%), multimedia/graphics (6%), and transmis- 
sion/telecommunication (4%). 


Mechanical Engineering: Three-quarters of both 2004 
and 2014 Mechanical Engineering jobs fall in the mechanical 
drawing cluster. While the other quarter of 2004 jobs men- 
tion plant-related terms including “assembly”, “weld” and 
“motor”, the other quarter of 2014 jobs is related to comput- 
ing (“hardware”, “automate”, C++, Java, C, “web”, “code”). 
Clustering 2014 jobs further reveals a 60-20-20 split among 
mechanical drawing, embedded systems and web develop- 
ment jobs. 


To summarize, our clustering methodology identzi- 
fies the types of available jobs in various disciplines. 
Through frequent term analysis, we found that soft 
skills and basic computing skills appear to be impor- 
tant in all disciplines in the 2014 job dataset. 


5. DISCUSSION AND CONCLUSIONS 


In this paper, we presented a text mining methodology to 
extract, compare and cluster important terms from freetext 
job descriptions. Our method identifies required skills as 
well as working environment and company culture descrip- 
tors. To demonstrate the utility of our methodology, we 
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analyzed a dataset containing nearly 30,000 undergraduate 
co-operative job postings from two years: 2004 and 2014. 
Our main findings are as follows. 


e As expected in an undergraduate co-op marketplace, 
there are many assistant and junior positions, but less 
so in 2014 than in 2004. 


e Basic computing skills are needed in almost all disci- 
plines and at all levels. In other words, many non-IT 
disciplines appear to be trending towards IT. 


e Soft skills are mentioned frequently by job postings 
from all disciplines, and more so in 2014 than in 2004. 
For example, over all disciplines, “team” was men- 
tioned 20% more often in 2014 than in 2004. (in 71% 
vs. 51% of all job postings). These findings agree with 
those reported in [3] [8] [25]. Besides teamwork, commu- 
nication and leadership were frequently mentioned in 
job postings across all disciplines, with IT postings ad- 
ditionally mentioning mindset-related terms (passion 
and love for the work), Finance jobs mentioning inter- 
personal relationships and Health Studies jobs recruit- 
ing active students. 


e Regardless of discipline, lower-year positions were and 
are more clerical and/or involve more basic comput- 
ing. Upper year positions tend to mention advanced 
concepts and solution methods. 


e We identified several trends over time by compar- 
ing 2004 jobs with 2014 jobs. For example, IT jobs 
now emphasize mobile and cloud computing, Arts jobs 
involve social media and Chemical Engineering jobs 
mention sustainable energy. 


e Job postings from different disciplines suggest different 
working environments: plants in Chemical and Me- 
chanical Engineering, labs in Biology, and casual, fun 
and collaborative environments in IT. 


We emphasize that our results should be interpreted care- 
fully due to the following factors. 


e Diversity in size and age of companies, e.g., the IT dis- 
cipline has many modern companies that emphasize a 
fun work culture, while other disciplines such as Fi- 
nance have more traditional companies which might 
emphasize client relationships. 


e Incorrect job descriptions which may not reflect the 
true nature of the job; e.g., employers may write or 
modify the job descriptions to suit the company’s pub- 
lic image. 


Nevertheless, we believe that our findings are of interest to 
students, employers and the institution. We provide several 
examples of actionable insights below. 


e We can provide students with a better understanding 
of co-op opportunities in various disciplines and there- 
fore help them select the right academic program and 
career. 


In particular, we suggest that all students, regardless of 
discipline, acquire basic computer programming skills, 
which should help them secure co-op positions in their 
junior years. 


The institution can use our findings to manage the 
expectations of junior students. As we showed, it may 
take until senior years to obtain a co-op position that 
fully utilizes advanced discipline-specific skills. 


e The institution may use frequently appearing job at- 
tributes and the clustering of jobs in various disciplines 
to produce more effective promotional material for its 
co-op programs and to help attract strong students. 


e With the help of our findings, the institution can make 
an informed decision about how to change academic 
curricula to align with employers’ needs. For exam- 
ple, as all disciplines seem to emphasize teamwork, 
the institution can incorporate more team exercises in 
the curriculum. Hackathons and other competitions 
could be organized to foster passion and other mindset- 
related skills for IT students, while mock client meet- 
ings could be arranged for Finance students so that 
they could hone their interpersonal skills. New tools 
and methods may be introduced in courses when the 
corresponding terms begin to appear in job descrip- 
tions. 


e Employers may examine our findings to understand 
which skills are in high demand and to understand the 
extent of competition in the co-op market. 


e Our lists of frequent attributes may be used to re- 
design the way employers submit job postings. For 
instance, separate fields (outside the job description) 
may be added for required skills and company culture 
descriptions, with drop-down lists populated with fre- 
quent terms obtained through our methodology. Ad- 
ditionally, our clustering methodology can be used to 
segment the job descriptions to make it easier for stu- 
dents to find jobs they are interested in. 


Naturally, there is more data-driven work that can be done. 
The goal of a successful co-op system is to match the right 
student with the right employer. Thus, our long-term re- 
search objective is to minimize the gap between employers’ 
needs and students’ talents. In this paper, we focused on job 
descriptions, which provide an indication of what co-op em- 
ployers are looking for and what working environments they 
offer. In future work, we will characterize what students 
have to offer by mining resumes. Furthermore, we plan to 
study the gap between what employers want and what is be- 
ing taught in schools (e.g., by comparing job postings with 
course descriptions). Another interesting research direction 
is to determine if students are likely to obtain full-time jobs 
at one of their co-op employers after graduating. Finally, 
we are interested in comparing our job postings with those 
from other institutions worldwide. For example, the knowl- 
edge of foreign languages did not appear to be important in 
our dataset but it may be important in other countries. 
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